Effect of timeframes to define long term conditions and sociodemographic factors on prevalence of multimorbidity using disease code frequency in primary care electronic health records: retrospective study

Objective To determine the extent to which the choice of timeframe used to define a long term condition affects the prevalence of multimorbidity and whether this varies with sociodemographic factors. Design Retrospective study of disease code frequency in primary care electronic health records. Data sources Routinely collected, general practice, electronic health record data from the Clinical Practice Research Datalink Aurum were used. Main outcome measures Adults (≥18 years) in England who were registered in the database on 1 January 2020 were included. Multimorbidity was defined as the presence of two or more conditions from a set of 212 long term conditions. Multimorbidity prevalence was compared using five definitions. Any disease code recorded in the electronic health records for 212 conditions was used as the reference definition. Additionally, alternative definitions for 41 conditions requiring multiple codes (where a single disease code could indicate an acute condition) or a single code for the remaining 171 conditions were as follows: two codes at least three months apart; two codes at least 12 months apart; three codes within any 12 month period; and any code in the past 12 months. Mixed effects regression was used to calculate the expected change in multimorbidity status and number of long term conditions according to each definition and associations with patient age, gender, ethnic group, and socioeconomic deprivation. Results 9 718 573 people were included in the study, of whom 7 183 662 (73.9%) met the definition of multimorbidity where a single code was sufficient to define a long term condition. Variation was substantial in the prevalence according to timeframe used, ranging from 41.4% (n=4 023 023) for three codes in any 12 month period, to 55.2% (n=5 366 285) for two codes at least three months apart. Younger people (eg, 50-75% probability for 18-29 years v 1-10% for ≥80 years), people of some minority ethnic groups (eg, people in the Other ethnic group had higher probability than the South Asian ethnic group), and people living in areas of lower socioeconomic deprivation were more likely to be re-classified as not multimorbid when using definitions requiring multiple codes. Conclusions Choice of timeframe to define long term conditions has a substantial effect on the prevalence of multimorbidity in this nationally representative sample. Different timeframes affect prevalence for some people more than others, highlighting the need to consider the impact of bias in the choice of method when defining multimorbidity.

1.A disease category indicates symptoms, e.g.urinary incontinence/dysmenorrhoea.2. A disease category indicates a disease which may be acute or present as a single short-lived episode, e.g.gastritis, pancreatitis, sinusitis, depression, anxiety.
3. A disease category may be easily confused with another disease and be diagnosed in primary care clinically without investigations, e.g.skin disorders, asthma.4. A disease category may indicate the result of a single blood test, e.g.anaemias (except for aplastic anaemia).The disease code of Chronic Kidney Disease (CKD) is not included here as the definition implies two tests at least 3 months apart.
Where these criteria were not met, then a single code was deemed sufficient for diagnosis: Data were de-duplicated based on disease categories on the same date, i.e., where two different codes indicating the same disease were recorded on the same day, only one was counted.
For diabetes, many codes are used for both Type 1 and Type 2 diabetes, e.g., 'Diabetes mellitus' and 'O/E -diabetic maculopathy present both eyes'.Furthermore, the initial diagnosis may be mis-classified, with the correct diabetes type later recorded.To avoid double or triple counting multimorbidity burden in people with diabetes, we selected the most likely diabetes type using the following algorithm: Cholesterol, Raised LDL-C, Low HDL-C and Raised Triglycerides).We retained only values with abnormal readings defined in the CALIBER study and made distributional assumptions regarding the units (mmol/L or mg/dL) where the defined units were implausible (Table A1).
In the original CALIBER study, CKD also included blood test results, based on the eGFR, but we did not include those here, and included CKD only where a diagnostic code was present.
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) BMJMED doi: 10.1136/bmjmed-2022-000474 :e000474.

5.
Where a weight was recorded on a given date, but no height recording, the most recent older height record was used.
6. Remaining weight records without a matching height, the first available future height measurement was used.
7.Where BMI could not be calculated on a given date with a height or weight measurement, but a BMI was recorded in CPRD, this was used in preference.

Problem definition
In CPRD, a separate 'Problem' table records observation codes that have been marked as problems.All entries in the Problem table link to one (and only one) corresponding observation in the Observation table.We extracted all active problems in the Problem table that had a corresponding link to a code in the Observation table, using the obsid variable.
12.1% of our extracted observation codes had a link to a problem code.

Statistical analysis
1. Linear equation for the mixed effects logistic regression model: Where j represents the GP practice and i represents each patient in practice j and:

{
And where and are binary indicators for having multimorbidity under the single code definition and each of the alternative definitions, respectively, in patient i in practice j.

Linear equation for the mixed effects negative binomial regression model:
Where j represents the GP practice and i represents each patient in practice j and is the change in the number of LTCs for patient i in GP practice j for each definition, compared to a single code definition.

1 .
Count all occurrences of Type 1, Type 2 and unspecified/other codes for each patient 2. If only one type, then assign this as disease type 3.If Type 1 and no Type 2 codes, then assign as Type 1 4.If Type 2 and no Type 1 codes, then assign as Type 2 5.If a mix of Type 1 and Type 2 codes, then look within last 3 years of observations: a.If Type 1 and no Type 2 codes, then assign as Type 1 b.If Type 2 and no Type 1 codes, then assign as Type 2 c.If a mix of both Type 1 and Type 2 codes, then assign as unspecified Four diseases in the CALIBER code lists were defined by blood tests alone (Raised Total BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) BMJMED doi: 10.1136/bmjmed-2022-000474

Prevalence of conditions requiring multiple codes under alternative timeframes for diagnosis (for conditions with at least 2% prevalence)
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)

Prevalence of conditions using codes appearing in the problem table compared to a single code definition (for conditions with at least 1% prevalence)
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)